
About the Provider
DeepSeek is a Chinese artificial intelligence company based in Hangzhou, Zhejiang that focuses on research and development of large language models and advanced AI technologies. The firm emphasizes open innovation in AI, publishing models and research under permissive licenses to make powerful language models widely accessible and support collaborative development in the global AI community.Model Quickstart
This section helps you quickly get started with thedeepseek-ai/deepseek-r1-distill-llama-70b model on the Qubrid AI inferencing platform.
To use this model, you need:
- A valid Qubrid API key
- Access to the Qubrid inference API
- Basic knowledge of making API requests in your preferred language
deepseek-ai/deepseek-r1-distill-llama-70b model and receive responses based on your input prompts.
Below are example placeholders showing how the model can be accessed using different programming environments.You can choose the one that best fits your workflow.
Model Overview
DeepSeek R1 Distill Llama 70B is a distilled large language model optimized for efficient, high-level reasoning and conversational intelligence. It is trained by distilling high-quality reasoning outputs from DeepSeek-R1 into a 70B LLaMA-based architecture, delivering near frontier-level analytical performance while running on significantly smaller hardware compared to full-scale models.Model at a Glance
| Feature | Details |
|---|---|
| Model ID | deepseek-ai/deepseek-r1-distill-llama-70b |
| Architecture | LLaMA-3.1-70B (Distilled) |
| Model Size | 70B parameters |
| Parameters | 6 |
| Training Data | Distilled from DeepSeek R1 high-quality reasoning outputs with LLaMA 70B |
| Context Length | 64K tokens |
When to use?
Use DeepSeek R1 Distill Llama 70B if you need:- Strong reasoning and chain-of-thought capabilities for complex tasks
- Long-context support up to 64K tokens
- Efficient deployment compared to full, non-distilled frontier models
- Open-source licensing suitable for on-premise or custom deployments
- Reliable performance across math, logic, coding, and research workflows
Inference Parameters
| Parameter Name | Type | Default | Description |
|---|---|---|---|
| Streaming | boolean | true | Enable streaming responses for real-time output. |
| Temperature | number | 0.3 | Controls creativity and randomness; higher values produce more diverse output. |
| Max Tokens | number | 10000 | Defines the maximum number of tokens the model is allowed to generate. |
| Top P | number | 1 | Nucleus sampling that limits token selection to a subset of top probability mass. |
| Reasoning Effort | select | medium | Adjusts the depth of reasoning and problem-solving effort; higher values increase response quality at the cost of latency. |
| Reasoning Summary | select | auto | Controls verbosity of reasoning explanations: auto, concise, or detailed |
Key Features
- High-Quality Reasoning: Optimized for strong reasoning and chain-of-thought capabilities, suitable for complex tasks.
- Long-Context Support: Can handle up to 64K tokens, enabling processing of very large inputs.
- Efficient Deployment: Distilled model runs efficiently compared to full 70B models, reducing hardware requirements.
- Configurable Inference: Supports adjustable parameters like temperature, streaming, reasoning effort, and verbosity for flexible and precise outputs.